





DEPARTMENT OF COMPUTER SCIENCE 
COLLEGE OF SCIENCES 
OLD DOMINION UNIVERSITY 
NORFOLK, VIRGINIA 23529 


BUILDING A GENERALIZED DISTRIBUTED SYSTEM MODEL 

By 

Ravi Mukkamala, Principal Investigator 
Progress Report 

For the period ended July 31, 1991 


Prepared for 

National Aeronautics and Space Administration 
Langley Research Center 
Hampton, Virginia 23665 


Under 

Research Grant NAG-1 -1114 

Wayne H. Bryant, Technical Monitor 
ISD-Systems Architecture Branch 


(NASA-C°-iaS181 ) BUILDING 
DISTRIBUTED SYSTEM MQDtL P 
period ended 31 Jul. 1991 
Un i v . ) 70 p 


A GENERALIZED 
ro^ress Report, 
Ijld Dominion 
CSCL 


0 5 B 


G 3/d 2 


N9 1—23970 
— THRU — 
N91-23975 
Unci <js 

0012202 


? 


May 1991 


DEPARTMENT OF COMPUTER SCIENCE 
COLLEGE OF SCIENCES 
OLD DOMINION UNIVERSITY 
NORFOLK, VIRGINIA 23529 


BUILDING A GENERALIZED DISTRIBUTED SYSTEM MODEL 


By 

Ravi Mukkamala, Principal Investigator 


Progress Report 

For the period ended July 31, 1991 


Prepared for 

National Aeronautics and Space Administration 
Langley Research Center 
Hampton, Virginia 23665 


Under 

Research Grant NAG-1 -11 14 

Wayne H. Bryant, Technical Monitor 
ISD-Systems Architecture Branch 


Submitted by the 

Old Dominion University Research Foundation 

P.O. Box 6369 

Norfolk, Virginia 23508-0369 


May 1991 


N9 1 - 23 971 


Building a Generalized Distributed System Model 

R. Mukkamala 
E.C. Foudriat 

Department of Computer Science 
Old Dominion University 
Norfolk, Virginia 23529. 


Annual Report and Renewal Request 


Abstract 

The key elements in the first year (1990-91) of our project were: 

• Investigate the effects of modeling on distributed system performance 
predictions. 

• Look at possible graphical interfaces to the proposed distributed pro- 
totype and simulator system. 

• Conduct preliminary studies towards the design of a generalized dis- 
tributed system. 

In the second year of the project (1991-92), we propose to 

• Develop detailed designs for the prototype. 

• Implement and test the system. 

• Conduct further studies on modeling distributed systems. 
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1 Introduction 


In the 1990-91 proposal, we discussed the need for building a modeling tool 
for both analysis and design of distributed systems. To this end, we have 
been considering different design architectures for the modeling tool. Since 
many of the research institutions have access to networks of workstations, we 
have decided to build a tool running on top of the workstations to function 
as a prototype as well as a distributed simulator for a computing system. 

In addition, we have been investigating the effects of system modeling 
on performance prediction in distributed systems. While some performance 
measures such as the average number of participating node set size of a 
distributed transaction is not very sensitive to the underlying model, mea- 
sures such as transaction commutativity measures are quite sensitive to the 
evaluation models. 

We have also considered the effects of static locking and deadlocks on the 
performance predictions of distributed transactions. While the probability 
of deadlock is considerably small in a typical distributed system, its effects 
on performance could be significant. 

In this report, we summarize our progress in these three areas and de- 
scribe the details of the proposed work. 

2 Distributed System Model: Prototype/Simulator 

The main goals of our efforts in building a general tool for simulation and 
prototyping of distributed systems are: 

• A framework to experiment with distributed algorithms/systems. 

• Implement in terms of basic primitives (e.g., RPC, reliable communi- 
cation). 

• A good user interface - preferably with graphic and mouse functions. 

• Provisions to include user specific code for different components. 

• A library of procedures representing typical options for components 
(e.g. two-phase locking). 

• A base for distributed simulation as well as prototyping. 

• Efficient mechanisms to monitor and display the activities. 
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• Powerful performance analysis tools. 

To this end, we started looking at a transaction oriented distributed 
system. Since our aim is to provide a general framework rather than to 
provide a solution to a particular model, our goal is to provide some of 
the basic primitives at the bottom layer, and let the user build the needed 
upper level software. To make the prototype usable for a novice user, we 
propose to provide a graphic interface through which a user can specify the 
system configuration. As an example application, we considered distributed 
database system modeling. As shown in Figure 1, we identified seven ma- 
jor components. Each of these components can be further described in a 
detailed model. For example, the local manager can be modeled as a coor- 
dinator of local concurrency control manager and the transaction resource 
manager. Given a set of components, the control structure of the system 
can be represented through directional links. Figure 2 illustrates one such 
control structure. 

After considering several alternates, we decided to base the graphic inter- 
face on the lines of the MIT Network simulator. The MIT simulator is devel- 
oped at Massachusetts Institute of Technology with funding from DARPA. 
Even though it is intended for simulating communication networks, we have 
decided to adopt its graphic interfacing routines for our distributed simula- 
tor. Since the source code (in C) is available, we are modifying this code 
to suit our needs. Some of its distinguishing characteristics of the network 
simulator are: 

• Internetwork simulator 

• Components include gateways, network links, hosts, TCPs and users. 

• Network configuration is displayed on the screen. 

• User can control the simulation. 

• Network configuration can be modified with the mouse. 

• Other simulation parameters can be changed on-line using the mouse. 

• Network configuration can be saved for later use. 

• Several performance measures may be printed. 
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Figure 1 
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Since process communication is a basic primitive needed in distributed 
systems, we have decided to provide this as a basic mechanism in our system. 
Currently, we are experimenting with the Sun RPC system calls to design a 
high-level primitive. RPC has several advantages including: 

• Hiding details of network programming 

• Availability of library routines 

• Hiding the operating system dependencies 

• Availability of the standard data representation using XDR format 
which allows a simple way of transferring data. 

3 Effects of Modeling on Performance Predic- 
tions 

As a second part of our study, we have conducted investigations to deter- 
mine the impact of modeling on distributed system performance. Here, we 
summarize the results of two such studies: 

Study 1: Effect of Data Distribution Models on Transaction Com- 
mutativity [2]. Recognizing commutativity among transactions appears 
to reduce the number of rollbacks (at the time of merge) in a partitioned 
distributed database system [1]. The main objective of this study is to de- 
termine the impact of data distribution modeling on the evaluation of the 
benefits due to commutativity. We studied the effects of six distinct data 
distribution models on the evaluation of the number of rollbacks. We de- 
rived closed form expressions for five of the six models, and used simulation 
for the sixth model. The conclusions from this study are summarized as 
follows. 

• Random data models that assume only average information about the 
system result in conservative estimates of system throughput. 

• Adding more system information does not necessarily lead to better 
approximations. In this paper, the system information is increased 
from model 6 to model 2. Even though this increases the computa- 
tional complexity, it does not result in any significant improvement in 
the estimation of the number of rollbacks. 
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• Transaction commutativity appears to significantly reduce transaction 
rollbacks in a partitioned distributed database system. This fact is 
only evident from the analysis of model 1. On the other hand, when 
we look at models 2-6, it is possible to conclude that commutativity is 
not helpful unless it is extremely high. Thus, conclusions from model 
1 and models 2-6 are contradictory. 

• The replication distribution (i.e., the actual number of copies for each 
object) seems to effect the evaluations significantly. Thus, accurate 
modeling of this distribution is vital to evaluation of rollbacks. 

Study 2: Effect of Data Distribution and Reliability Models on 
Transaction Availability [3]. In this study, we selected three abstractions 
for data distribution modeling and three for node reliability modeling, and 
constructed six system models. Here, transaction availability is defined as 
the probability with which all data copies required by a transaction are 
available at the beginning of its execution. As before, we could derive closed 
form expressions with five of the six models (using probabilistic analysis), 
and used simulation for the other model. A transaction was characterized 
by the number of data objects that it accesses, s. The conclusions derived 
from this study are summarized as follows. 

• By choosing a proper distributed database model, the computational 
complexity of transaction availability evaluations can be significantly 
reduced. 

• For values of s < 10, all models result in almost the same transaction 
evaluation. 

• The degree of replication of individual (or group) data objects seems 
to have a significant effect on transaction availabilities. Thus, when 
different data objects have different copies, adopting average degree of 
replication at the system level may not result in accurate availability 
evaluations. 

• The actual distribution of data object copies has some, if not signifi- 
cant, impact on availability evaluation. 

• In a heterogeneous environment where different nodes may have dif- 
ferent reliabilities, it is sufficient to represent each node by the average 
node reliability, without affecting the availability evaluations. 
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Having conducted these studies, we conclude that 

• Adopting simple models may drastically reduce the complexity of met- 
ric evaluations. 

• Choosing analytically tractable models enables easy interpretation of 
functional dependencies. 

• By choosing inappropriate models, for either analytical tractability or 
conceptual simplicity, it is possible to arrive at incorrect conclusions. 

• Model choice is highly dependent on the metric. While a simple model 
serves well for one metric, it may be insufficient for another metric. 

4 Determining the Effects of Locking on Distributed 
Transactions 

Deadlocks are known to deteriorate performance in both centralized and 
distributed database systems [4,5], In spite of this, several performance 
studies have ignored the deadlock problem in their analyses [6]. In [4], Shyu 
and Li proposed an elegant technique to evaluate the response time and 
throughput of transactions in a non-replicated DDS. Assuming exclusive 
locking (i.e., only write operations), they model the queue of lock requests 
at an object as a M/M/1 queue. This results in a closed-form for the waiting 
time distribution at a node, expressed in terms of the average rates of arrivals 
of requests and the average lock-holding time. 

In general, a database transaction reads from a set of data objects (the 
read-set) and writes on to a set of data objects (the write-set). In this 
paper, we consider both the the read and the write operations of database 
transactions, and propose a technique for performance evaluation. 

We make the following observations from evaluations made with our 
technique. 

• As expected, the presence of shared locks has a substantial impact on 
the probability of deadlock occurrence. When only 1/3 of the accessed 
data objects are updated, the probability of deadlock is considerably 
small as compared to when all objects are updated. 

• The observations about the deadlock probabilities are also valid for 
restart probabilities. 
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• Transaction response times are also quite sensitive to the ratio of 
shared locks. Here, we compare the response times when deadlocks 
are ignored with those obtained when deadlocks are considered. The 
effect of deadlocks is more predominant at higher transaction loads 
and with smaller values of read ratio. When 1/3 of the accessed ob- 
jects are updated, the effect of deadlocks is not significant on response 
time. 

• The effect of deadlocks on response time is decreased with the increase 
in the number of data items. Obviously, this is due to the decrease in 
probability of conflicts and hence a decrease in deadlock occurrence. 
When only 1/3 of the accessed data are updated, this effect is almost 
insignificant. When 2/3 of the accessed data are updated, deadlocks 
seems to have a noticeable effect on response time. 

• When a small number of data objects are accessed, the probability of 
deadlock is negligible, and hence its effect on response time is small. 
When more data objects are accessed, the effect of deadlocks on re- 
sponse times is significant. 

5 Summary of Accomplishments in 1990-91 

We have published the results of our research (since August 1990) in two 
conferences. In addition, two papers are submitted for publication in inter- 
national journals. These are: 

1. Y. Kuang and R. Mukkamala, “Performance Analysis of Static Lock- 
ing in Replicated Distributed Database Systems,” Proc . Southeastcon 
1991 , pp. 698-701. 

2. Y. Kuang and R. Mukkamala, U A Note on the Performance Analysis of 
Static Locking in Distributed Database Systems”, Submitted to IEEE 
Trans. Computers, December 1990. 

3. R. Mukkamala, “Effects of Distributed Database Modeling on Evalu- 
ation of Transaction Rollbacks,” Proc. WSC’91 , December 1990, pp. 
839-845. 

4. R. Mukkamala, “Measuring the Effects of Distributed Database Mod- 
els On Transaction Availability Measures,” Submitted to Performance 
Evaluation Journal, March 1991. 


7 



In addition, our current work on building the prototype for a distributed 
system should result in several conference and journal papers in 1991-92. 

6 Proposed Research Efforts in 1991-92 

During the next grant period (August 1991 to July 1992), we propose to 
continue the study and development of the distributed prototyping and sim- 
ulator system. The main main problems that we need to solve in this period 
are: 

• Complete the graphic interface design and implement it on Sun work- 
stations. 

• Investigate efficient means of offering flexible as well as efficient means 
of specifying interfacing between system components. We expect this 
phase to consume considerable time. 

• Design, build, and test a specific system using the primitives offered 
by the system. Experiences from building a specific system should aid 
us in developing a generalized prototyping tool. 

• We propose to use the prototype to evaluate the performance of several 
distributed mutual exclusion policies. Such a study may result in the 
development of new policies. 

• We propose to do further investigations in modeling of distributed 
systems and determine their impact on predictive analysis tools. 
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ABSTRACT 

Data distribution, degree of data replication, and transaction 
access patterns are key factors in determining the performance 
of distributed database systems. In order to simplify the evalua 
lion of performance measures, database designers and researchers 
tend to make simplistic assumptions about the system. In this 
paper, we investigate the efl\*cL of modeling assumptions on the 
evaluation of one such measure, tlie number of transaction roll- 
backs, in a partitioned distributed database system. We develop 
six probabilistic models and develop expressions for the number 
of rollbacks under each of these models, essentially, the models 
differ in terms of the available system information. The analyti- 
cal results so obtained are compared to results from simulation. 
From here, we conclude that most of the probabilistic models 
yield overly conservative estimates of the number of rollbacks. 
The effect of transaction commutativity on system throughput is 
also grossly undermined when such models are employed. 

1. INTRODUCTION 

A distributed database system is a collection of cooperating 
nodes each containing a set of data items (In this paper, the 
basic unit of access in a database is referred to as a data item.). 
A user transaction can enter such a system at any of these nodes. 
The receiving node, sometimes referred to as the coordinating or 
initiating node, undertakes the task of locating the nodes that 
contain the data items required by a transaction. 

A partitioning of a distributed database (DDB) occurs when 
the nodes in the network split into groups of communicating 
nodes due to node or communication link failures. The nodes 
in each group can communicate with each other, but no node in 
one group is able to communicate with nodes in other groups. We 
refer to each such group as a partition. I he algorithms which al- 
low a partitioned DDB to continue functioning generally fall into 
one of two classes (Davidson et al. 1985], I hose in the first class 
take a pessimistic approach and process only those transactions 
in a partition which do not conflict with transactions in other par- 
titions, assuring mutual consistency of data when partitions are 
reunited. The algorithms in the second class allow every group 
of nodes in a partitioned DDB to perform new updates. Since 
this may result in independent updates to items in different par- 
titions, conflicts among transactions are bound to occur, and the 
databases of the partitions will clearly diverge. Therefore, they 

require a strategy for conflict detection and resolution. Usually, 
rollbacks are used as a means for preserving consistency; con- 
flicting transactions are rolled back when partitions are reunited. 
Since coordinating the undoing of transact ions is a very diflicult 
task, these methods are called optimistic since they are useful 
primarily in a situation where the number of items in a par- 
ticular database is large and the probability of conflicts among 
transactions is small. 

In general, determining if a transaction that successfully ex- 
ecuted in a partition is rolled back al the time the database 
is merged depends on a number of factors. Data items in the 
read-set and the write-set of the transaction, I he distribution of 
these data items among the other partitions, access pal terns of 
transactions in other partitions, data dependencies among the 
transactions, and semantic relation (if auyl 1 -'tween these trans- 
actions are some examples of these factors. Exact, evaluation of 


rollback probability for all transactions in a database (and hence 
the evaluation of the number of rolled bac k transactions) gen- 
erally involves both analysis and simul.it ion , and requires large 
execution times [Davidson 1982; Davidson 198-lj. lo overcome 
the computational complexities of evaluation, designers and re- 
searchers generally resort tci approximation techniques [David- 
son 1982; Davidson I9S6; Weight 1983a; Wright 1983 b). These 
techniques reduce the' computation time.* by making simplifying 
assumptions to represent the underlying distributed system. f l he 
time complexity of the resulting techniques greatly depends on 
the assumed model as well as evaluation techniques. 

In this paper we are interested in detenniiiing I he ellect of the 
distributed database models on tin* computational complexity 
and accuracy of the rollback sI.iIinIus iu a pai t it ioiiecl database. 

The balance of this paper is outlined a*- lollous. Section 2 for- 
mally defines the problem under consideration. In Section 3, we 
discuss the data distribution, replication, and transaction model- 
ing. Section 1 derives the rollback statistics lor one distribution 
model. In Section 5, we compare* I he analysis methods for six 
models and simulation method lor one model based on computa- 
tional complexity, space complexity, and accuracy ol the measure. 
Finally, in Section 6, we summarize the obtained results. 

2. PROBLEM DESCRIPTION 

Even though a transaction l\ in partition P| may be rolled 
back (at merging time*) by another transaction / j in partition 1\ 
due to a number of reasons, the following two cast's art* found to 
be the major contributors [Davidson 1982]. 

i. p, ^ P 2 , and there is at least one data item which is up- 
dated by both 7, and 7‘,. This is referred lo as a write- write 
conflict. 

ii, }\ = P 2 , T-i is rolled back, and it is a dependency paienl of 
'J\ (i t*., /’, has read at least one data item updated by 7' 2 , 
and T-i occurs prior to 7j in the serialization sequence). 

The above discussion on reasons for rollback only considers 
the syntax of transactions (i.e. read- and write-sets) and does 
not recognize any semantic relation between them. To be more 
specific, let us consider transactions T\ and T 2 executed in two 
different partitions P\ and Pi respectively. Let us also assume 
that the intersection between the write-sets of l\ and is non- 
empty. Clearly, by the above definition, there is a write write 
conflict and one of the two transactions has to be rolled back. 
However, if 7j and t j commute wit h eac h other, then there is no 
nerd Lo rollback eil her ul I be transae lions at tin* lime of partition 
merge [Garcia-Molma 1983; Jajodia and Speckman 1985; Jajodia 
and Mukkamala 1990], Instead, Vj needs to be executed in 1 1 
and needs to be executed in P x . The analysis in this paper 
take this property into account. . 

In order to compute the number of rollbacks, it is also nec- 
essary to define some ordering (O(P)) on the partitions, for 
example, if T, and T t correspond to case (i) above, and do not 
commute, it is necessary to determine which of these two are 
rolled back at the* time* of me rging. Partition ordering resolves 
this ambiguity by the following rule: Whenever two conflicting 
but non-commuting transactions are executed in two different 
partitions, then the* transaction executed in the lower oidei par- 
tition is rolled back. 
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Since a transaction may bo rolled back due to either (i) or 
(ii), we classify the rollbacks into two classes: Class i and Class 
2 respectively. Tlie problem of estimating the number of roll- 
backs at the time of partition merging in a partially replicated 
distributed database system may be formulated as follows. 

Given the following parameters, determine the number of 
rolled back transactions in class 1 ( /?, ) and class 2 (/fj), 

• h, the number of nodes in the database; 

• </, the number of data items in the database; 

• p, the number of partitions in the distributed system (prior 
to merge); 

• /, the number of transaction types; 

• GD y the global data directory that contains the location of 
each of the d data items; the Cl) matrix has d rows and n 
columns, each of which is either 0 or 1 ; 

• NSk t the set of nodes in partition k y VH* = 1,2,..., p; 

• RS Jy the read-set of transaction type j , j = 1,2,...,/; 

• WS }i the write-set of transaction type j, j = 1,2, . . . ,/; 

• the number of transactions of type j received in par- 
tition k (prior to merge), j = 1 , 2. . . , / , if - l , 2, .... p. 

• CA/, the commutativity matrix that defines transaction 
commutativity. If (7A/ ;UJ = true then transaction types jy 
and commute. Otherwise they do not commute. 

The average number of total rollbacks is now expressed as R = 
Ry 4- n 7 . 

3. MODEL DESCRIPTION 

As stated in the introduction, the primary objective of this 
paper is to investigate the elfect of data distribution, replication, 
and transaction models on estimation of the number of rollbacks 
in a distributed database system. 

To describe a data distribution-transaction model, we char- 
acterize it with three orthogonal parameters: 

1. Degree of data item replicat ion (or the number of copies). 

2. Distribution of data item copies. 

3. Transaction characterization 

Wc now discuss each of these parameters in detail. 

For simplicity, several analysis techniques assume that each 
data item has the same number of copies (or degree of replica- 
tion) in the database system [Coffman el al. 1981). Some other 
techniques characterize the degree of replication of a database by 
the average degree of replication of data items in that database 
[Davidson 198G], Others treat the degree of replication of each 
data item independently. 

Some designers and analysts assume some specific allocation 
schemes for data item (or group) copies (e.g., [Mukkamala 1987]). 
Assuming complete knowledge of data copy distribution [(*!)) 
is one such assumption. Depending on the type of allocation, 
such assumptions may simplify the performance analysis. Others 
assume that each data item copy is randomly distributed among 
the nodes in the distributed system [Davidson 1986]. 

Many database analysts characterize a transaction by the size 
of its read-set and its write-set. Since different transactions may 
have different sizes, these are either classified based on the sizes, 
or an average read-set size and average write-set size are used to 
represent a transaction. Others, however, classify transactions 
based on the data items that they access (and not necessarily on 
their size). In this case, transaction types are identified with their 
expected sizes and the group of data items from which these are 
accessed. An extreme example is a case where each transaction in 
the system is identified completely by its read-set and its write- 


set. 

With these three parameters, we can describe a number of 
models. Due to the limited space, we chose to present the results 
for six of these models in this paper. 

We chose the following six models based on their applicability 
in the current literature, and their close resemblance to practical 
systems. In all t hose models, the rate of arrival of transactions 
at each of the nodes is assumed to be completely known a priori. 
We also assume complete knowledge of the partitions (i.e. which 
nodes are in which partitions) in all the models. 

Model 1: Among the six chosen models, this has the max- 
imum information about data distribution, replication, and 
transactions in the system. It captures the following infor- 
mation. 

• Replication: Data replication is specified for each data 
item. 

• Data distribution: The distribution of data items among 
the nodes in the system is represented as a distribution 
matrix (as described in Section 2). 

• Tmnsactions: All distinct transactions executed in a 
system arc represented by their read-sets and write- 
sets. Thus, for a given transaction, the model knows 
which data items are read, and which data items are 

updated. The commutativity information is also com- 
pletely known and is expressed as a matrix (as de- 
scribed in Section 2). 

Model 2: This model reduces the number of transactions 
by combining them into a set of transaction types based on 
commutativity, commonalities in data access patterns, etc. 
Since the transactions are now grouped, some of the indi- 
vidual characteristics of transactions (e.g. the exact read- 
set and writes-set) are lost. This model has the following 
information. 

• Replication: Average degree of replication is specified 
at the system level. 

• Data distribution: Since the read- and write- set infor- 
mation is not retained for each transaction type, the 
data distribution information is also summarized in 
terms of average data items. It is assumed that the 
data copies are allocated randomly to the nodes in the 
system. 

• Tmnsachons: A transaction type is represented by 
its read-set size, write-set size, and the number of 
data items from which selection for read and write 
is made. Since two transaction types might access the 
same data item, it also stores this overlap information 
for every pair of transaction types. The commutativ- 
ity information is stored for cacli pair of transaction 
types. 

Model 3: This model further reduce the transaction types 
by grouping them based only on commutativity character- 
istics. No consideration is given to commonalities in data 
access pattern or differing read-set and write-set sizes. It 
has the following information. 

• Replication: Average degree of replication is specified 
at the system level. 

• Data distribution: As in model 2, it is assumed that 
the data copies are allocated randomly to the nodes 
in the system. 

• Tmnsachons: A transaction type is represented by 
the average rcad-srt size and average write-set size. 
The commutativity information is stored for all pairs 
of transaction types. 

Model 4: This model classifies transactions into three 
types: read-only, read-write, and others. Read-only trans- 
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actions commute among themselves. Read-write transac- 
tions neither commute among themselves nor conunum with 
others. The others class corresponds to update transactions 
that may or may not commute with transactions in their 
own class. This fact is represented by a commute probabil- 
ity assigned to it. 

• Replication: Average degree of replication is specified 
at the system level. 

. Data distribution: As in model 2, it is assumed that 
the data copies are allocated randomly to the nodes 
in the system. 

• Transactions: Read-only class is represented by aver- 
age read-set size. The read-write class is represented 
by average read-set and write-set sizes. The others 
class is represented by the average read-set size, aver- 
age wrile-sel size and the probability of commutation. 

Model 5: This model reduces the transactions to two 

Classes- read-only and read-write. Read-only transact ions 
commute among themselves. The read-wr.te transactions 
corresponds to update transactions that may or may not 
commute with transactions in their own class This tact 
represented by a commute probability assigned to it. 

• Replication: Average degree of replication is specified 
at the system level. 

• Data distribution: As in model 2, it is assumed that 
the data copies are allocated randomly to the nodes 
in the system. 

. Transactions: Read-only class is represented by aver- 
age read-set size. The read-write class is represented 
by average read-set and write set sizes, and the piob 
ability of coimnulat ion. 

Model 6: This model identifies read-only transactions and 
other update transactions. But these two types have the 
same average read-set size. Update transactions may or 
may not commute with other update transactions. 

• Replication: Average degree of replication is specified 
at the system level. 

• Data distribution: As in mode! 2, it is assumed that 
the data copies are allocated randomly to the nodes 
in the system. 

• Transactions: The read-set size of a transaction is de- 
noted by its average. For update transactions, we also 
associate an average write-set size and the probability 
of commutation. 

Among these, model 1 is very general, and assumes complete 
nformation of data distribution (GO), replication, and tiansac- 
Lions. Other models assume only partial (or average) in orma n 
about data distribution and replication. Model las ie mos 
information and model 6 has the least. 

4. COMPUTATION OF THE AVERAGES 

Several approaches olfer potential for computing the average 
number of rollbacks for a given system env.ron.mmt ll c most 
orominent methods are simulation and probabilistic analysis. 

P Using simulation, one can generate the data distribution m 
trix (Cff) based on the data distribution and replication 
of the given model. Similarly, one can generate different trans- 
actions 6 (of different types) that can be received at the nodes ir 
the network. Since the partition information is ^ ^ ' ^ 

ilied, by .searching I he relevant coin. ...is of 'h. /.» "‘•d NX 
possible to determine whether a given liaiisarlion ( 

ressfullv executed in a given partition. Once all the successful 
transactions have been identified, and their data dependencies 
are identified, it is possible to identify the transactions that need 
to be rolled back at the time of merging. 1 lie generation and 
Clarion process may have to be repeated enough number of 
times to gel the required confidence m the final result. 


Probabilistic analysis is especially useful when interest is con- 
fined to deriving the average behavior of a system from a g 
model Generally, it requires less computation time. In tins pa 
per, we present detailed analysis for model 6, and a summary of 
the analysis for models 1-5. 

4. L Derivations for Model 6 

This model considers only two transaction types: read-only 
(Tvpe 1) and read-write (Type 2). Roll, have the same 
read set size of r. A read-write transaction updates w of the data 
i terns* that^tt reads. ,V lt and ,V„ represent the rate of arnval of 

types 1 and 2 respectively at partition k. The average degree 
of replication of a data item is given as c. 1 he system lias 
nodes and d data items. The probability that two rcad-wnte 

transaction commute is m. . ^ 

u-t us consider an arbitrary transaction /, received at one 
of the nodes in partition k with m nodes. Since the eopies ol 
a data item are randomly distributed among the n nodes, the 
probability that a single data item is accessible ill partition k is 
given by 




1 - 


(:) 


(i) 


Since each data item is imlepeudently allocated, the expect 
number of data items available m this partition is do*. S,n '' ^ 
since T, accesses r data items (on the average), the probability 
that it will be successfully executed is a \. From here, the number 
of successful transactions in k is estimated as a*/V|* and o*.v n 
for types 1 and respectively. . ( :\ 

In computing I he probability of rollback of/, due to case (i), 
we are only interested m I r ansa, lions that update a data .tem m 

the write- set of and not ting with /,. 1 he ptobab ltt> 

that a given .lata item (updated by 7 ,) is not updated tn another 
partition k' by a iiou-comimitmg transaction (with respect to 7,) 
is given by 


/ id \U-”»K* /V n' 

a* = v ” 


( 2 ) 


Given that a data item is available in probability that it is 
not available in is given as 


l(M') = 




(3) 


From here, the probability that a data item available in k is not 
updated any other transaction m Ingber order partitions is given 


«* = n w(t.t')+d —twnM 

v*',0(* , )>0(*) 


(«) 


The probability that transaction l\ is not in write-write con- 
flict with any other non-commuting transaction of higher-order 
partitions is now given as 


Rk 


TtT 


(5) 


From here, the number of trails... I ions rolled hack due to category 
(i) may be expressed as l( x - 

To compute the rollbacks of category (ij), we need to deter- 
mine the probability that 7’, is rolled back due to the roll ack of 
a dependency parent in the same partition If / 2 is a lead wnlc 
transaction in partition k, then the probability that I, depends 
on T 2 (i e. read-write conflict) is given by: 
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A t 



• Model '1, in addit ion to I hr space required by models 4 
((>) 6, also requires 0(/ 2 ) for commutativity matrix, Thus it 

requires (){ut + t 2 ) space. 


The probability that 1\ is not rolled back due to the roll back of 
any of its dependency parents is now given by: 


(Afcjn f 1 - An) 


• Model 2, in addition to the space required by model 3, 
also requires t 2 spat e to store the data overalp information. 
Thus, it requires 0(n/ + < 2 ) storage. 

Thus, model l has t he largest storage requirement and model 6 
has the least. 


where Nk = N\k 4- N?k an d u = A r 2 k/[^ik T Am). 

Tlic total number of rolled back transact urns due to category 
(ii) is now estimated as Rj = ZlE-iO “ \'Jt )<>*( Au + ftk Nik)- Hie 
total number of rolled back transactions is R - R x + Ri- 

5. COMPARISON OF THE MODELS 

As mentioned in the introduction, the main objective of this 
paper is to determine the effect of data distribution, replication, 
and transaction models on the estimation of rollbacks. I o achieve 
this, we evaluate the desired measure using six different data 
distribution and replication models. 1 be comparison of these 
evaluations is based on computational time, storage requirement, 
and the average values obtained. 

Due to the limited space, we could not present the detailed 
derivations for the average values for models 2 6. I he final ex- 
pressions, however, are presented in (Mukkamala 1990|. 

5.1 Computational Complexity 


5.3 Evaluation of the Averages 

In order to compare the effect of each of these models on 
the evaluation of the average rollbacks, we have run a number of 
experiments. In addition to the analytical evaluations for models 
1-6, we have also run simulations with Model 1. The results 
from these runs are summarized in Tables 1-7. Basically these 
tables describe the number of transactions successfully executed 
before partition merge ( Before At erg r), number of rollbacks due 
to class l (R t), rollbacks due to class 2 (Ri) y and transactions 
considered to be successful at the completion of merge ( After 
Afrrgr). Obviously, the last term is computed from the earlier 
three terms. In all these tables, the Lotal number of transaction 
arrivals into the system during partitioning is taken to be 65000. 
Also, each node is assumed to receive equal share of the incoming 
transactions, 

• T able 1 summarizes the effect of number of partitions as 
measured with Models 1-6. Here, it is assumed that each 
of t he dal a items in the system has exactly c = 3 copies. 
The other assumptions in models 1-6 are as follows: 


We now analyze each of the evaluation methods (for models 
1-6) for their computational complexity. 

• In mod<'l 1, all t transactions are completely specified, and 

the data distribution matrix is also known. To determine 
if a transaction is successful, we need to the scan the dis- 
tribution matrix. Similarly, determining if a transaction in 
a lower order partition is to be rolled back tine to a write- 
write conflict with a transaction of higher order partition 
requires comparison of write-sets of the two transactions. 
Determining if a transaction needs to be rolled back due to 
the rollback of a dependency parent also requires a search. 
All this requires + jdt 2 + />/ 2 A), where t is the num- 

ber of transaction types and A is the maximum number of 
transactions executed in a partition prior to the merge. 

• Models 2-6 have a svpiilar computation structure. 1 he num- 
ber of transaction types (f) is high for model 2 and low for 
model 6. Each of these models require 0(p i t 2 c + pt 2 N) 
time. As before, t is the number of transaction types and 
N is the maximum number of transactions executed in a 
partition prior to the merge. 

T hus, model I is tin* most complex (computationally) and model 
6 is the least complex. 

5.2 Space Complexity 

We now discuss the space complexity of the six evaluation 
methods: 

• Model l requires 0[dn) to store the data distribution ma- 
trix, 0(n) to store the partition information, O(dt) to store 
the data access information, and O(rcf) to store the trans- 
action arrival information. It also requires 0(< 2 ) to store 
the commutativity information. Thus, it requires 0(dn + 
di + nt + t 2 ) space to store model information. 

• Models 4-6 require similar information: O(t) to store the 
average size of read- and write- sets of transaction types, 
0(n{) for transaction arrival, O(n) for partition informa- 
tion, and 0(0 for commute information. Thus they require 
0(nt) space. 


1. Model 1 considers 130 transaction types in the sys- 
tem. Each is described by its read- and write-sets and 
whether it commutes with the oilier transactions. 90 
of the 130 are read-only transactions. T he rest of the 
40 are read- write. Among I he read-write, 15 commute 
with each other, another 10 commute with each other, 
and the rest of the 15 do not commute at all. I lie sim- 
ulation run takes the same inputs blit evaluates the 
averages by simulation. 

2. Model 2 maps the 130 t ransaction types into 4 classes. 
To make the comparisons simple, the above four classes 
(90+154-10+15) are taken as four types. The data 

overlap is computed from the information provided in 
model 1 . 

3. Model 3, to facilitate comparison of results, considers 
the above 4 classes. This model, however, does not 
capture the data overlap information. 

4. Model 4 considers three types: read-only, read-write 
that commute among themselves with some probabil- 
ity, and read-write that do not commute at all. 

5. Model 5 considers read-only transactions with read-set 
size of 3 and read- write transactions with read-set size 
of 6. Read write transactions commute with a given 
probability. 

6. Model 6 only considers the average read-set size (com- 
puted as 4 in our case), the portion of read-write trans- 
actions ( = 15/130), and the average writc-sct size for 
a read -write (= 2). Probability that any two transac- 
tions commute is taken to be 0.4. 

From Table l it may be observed that: 

• The analytical results from analysis of Model 1 is a 
close approximation of the ones from simulation. 

• The evaluation of number of successful transactions 
prior to the merge is well approximated by all the 
models. Model 6 deviated the most. 

• The difference in estimations of R\ and R? is signif- 
icant across the models. Model l is closest to the 
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simulation. Model 8 lias the worst accuracy. 

5, surprisingly, is somewhat better than Models a, 8,4, 
and 6. 

• The estimation of R t from models 2-6 is about 50 
times of the estimation from Model 1. I he estima- 
tions from Model 1 and the simulation are quite close. 
From here, we can see that, Models 2-6 yield overly 
conservative estimates of the number of rollbacks a 
the time of partition merge. While Model 1 estimated 
the rollbacks as 1200, Model 2-6 have approximated 
them as about IJU00. 

• This dilference in estimations seems to exist even when 
the number of partitions is increased. 

• Table 2 summarizes the elfect of number of copies on the 
evaluation accuracies of the models. It may be observed 
that 

• The difference between evaluations from Model 1 and 
the others is significant at low (c = 3) (as well as hig 
(c = 8) values of c. Clearly, the difference is more 
significant at high degrees of replication. 

• The case p, = 4 , = 6,c = 8 corresponds to a case 

where each of the 500 data items is available in both 
Hie partitions. This is also evident from the fact that 
all the 65000 input transactions are successful prior to 
the merge. 

. The results from the analysis and simulation of Model 
1 are close to those from simulation. 

• Table 3 shows the elfect of increasing the number of nodes 
from 10 (in Table 1 ) to 20. For large values of u, all the six 
models result in good approximations of successful trans- 
actions prior to merge. The defences in estimations ol ft, 
and R-i still persist. 

. Table 4 compares models fi and (i. While model (i only re- 
tains average read set size information for any t'ans.u U,.;, 
model 6 keeps this information for read-only and irad-wnU 
transactions separately. This additional information en- 
abled model 5 to arrive at better approximations or ft, 
and ft, In addition, the elfect of commutativity on H, ami 
ft, is not evident until m > 0.99. This is counterintuitive. 
The simplistic nature of the models is the real cause ot this 
observation. Thus, even though these models have resulted 
in conservative estimates of ft) and ft 2 , w e can’t draw any 
positive conclusions about the elfect of commutativity on 
the system throughput. 

• The comments that were made about the conservative na- 
ture of the estimates from models 5 and 6 also applies to 
model 2. These results are summarized in Table 5. Even 
though this mode! has much more system information than 
models 5 and 6, the results (/f| and /<a) are not very d lifer- 
ent. However, the eflect of commutativity can now be seen 
at m > 0.95. 

• Having observed that the effect of commutativity is almost 
lost for smaller values of m in models 2-6, we will now look 
at its effect with model 1. These results are summarized 
in Table 6. Even at small values of m, the effect of com- 
mutativity on the throughput is evident. In addition, it 
increases with m. This observation holds at both small 
and large values of c. 

• In Table 7, we summarize the effect of variations in num- 
ber of copies, lu Tables 1-6, we assumed that each data 
item lias exactly the same number of copies. This is more 
relevant to Model l. Thus we only consider this model in 
determining the effect of copy variations on evaluation of R\ 
and Hi- As shown in this table, the effect is significant. As 
the variation in number of copies is increased, tin* number 
of successful transactions prior to merge decreases. Hem e, 
the number of conflicts are also reduced. Ibis results m 


a reduction of R x and /?,. AS long as the variations are 
not very significant, the differences are also not significant. 

6. CONCLUSIONS 

In this paper, we have introduced the problem of estimating 
the number of rollbacks in a partitioned distributed database sys- 
tem We have also introduced the concept of transaction commu- 
tativity and described its effect on transaction rollbacks, hor this 
purpose, the data distribution, replication, and transaction char- 
acterization aspects of distributed database systems have been 
modeled with three parameters. We have investigated the effect 
of six distinct models on the evaluation of the chosen metric. 
These investigations have resulted in some very interesting ob- 
servations. This study involved developing analytical equations 
for the averages, and evaluating them for a range of parameters. 
We also used simulation for one of these mode s. Due to lack 
of space, we could not present all the obtained results m this 
paper. In this section, we will summarize our conclusions from 

these investigations. 

We now summarize these conclusions. 

• Random data models that assume only average information 
about the system result in very conservative estimates ot 
system throughput. One has to be very cautious m inter- 
preting these results. 

• Adding more system information does not necessarily lead 
to better approximations. In this paper, the system m tor - 
mation is increased from model G to model 2. Even though 
this increases the computational complexity, it does not 
result in any significant improvement in the estimation ot ^ 
number of rollbacks. 

• Model 1 represents a specific system. Here, we define the 
transactions completely. Thus it is closer to a real-file si 

u at ion. Results (analytical or simulation) obtained from 
this model represent actual behavior of the specified sys- 
tem However, results obtained from such a model arc too 
specific, and can't be extended for oilier systems. 

. Transaction commutativity appears to significantly reduce 
transaction rollbacks m a partitioned distributed database 
system. This fact is only evident from the analysis of model 
1. On the other baud, when we look at models it is 
possible to conclude that commutativity is not helpful un- 
less it -s very very high. Thus, conclusions from model 1 
and models 2-8 appear to be contradictory. Since mod- 
els 3-6 assume average transactions that can randomly sc- 
1,.,-t any data item to read (or write), the evaluations from 
ti,ese models are likely to predict higher conflicts »»<1 hence 
more rollbacks. The benefils due to couum.latlV'ty seenn to 
disappear in the average behavior. Model I, on the other 

hand, describes a specific system, and hence can accurately 
compute the rollbacks. It is also able to predict the benefits 
due to commutativity more accurately. 

• The distribution of number of copies seems to affect the 
evaluations significantly. Thus, accurate modeling of this 
distribution is vital to evaluation of rollbacks. 

In addition to developing several system models and evalua- 
tion techniques for these models, this paper has one significant 
contribution to the modeling, simulation, and performance anal- 
ysis community. 

If an abstract system model with average information is 
employed to evaluate the effectiveness of a new technique 
or a new concept, then we should only expect conservative 
estimates of the effects. In other words, if the results from 
the average models are positive, then accept the results. 
If these are negative, then repeat the analysis with a less 
abstracted model. Ooncepts/techniques that are not ap- 
propriate lor an average system may still be applicable for 
some specific systems. 
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Table 1. EIFrct of Number of Partitions on Rollbacks 
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lable 3. Klfect of Number of Nodes on Rollbacks 
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Table 4. Effect of ni on Rollbacks ( Modols 5 ami 0: />, = *1,/;, = 0,c - 
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Table 5. Effect of m on Rollbacks (Model 2: )>\ — 4,/^ = 6) 
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Abstract 

Data distribution, data replication, and system reliability are key fac- 
tors in determining the availability measures for transactions in distributed 
database systems. In order to simplify the evaluation of these measures, 
database designers and researchers tend to make unrealistic assumptions 
about these factors. In this paper, we investigate the effect of such assump- 
tions on the computational complexity and accuracy of such evaluations. We 
represent a database system with five parameters related to the above fac- 
tors. Probabilistic analysis is employed to evaluate the availability of read- 
only and read-write transactions. We consider both the read-one/write-all 
and the majority-read/majority- write replication control policies. We con- 
clude that transaction availability is more sensitive to variations in degrees 
of replication, less sensitive to data distribution, and insensitive to reliabil- 
ity variations in a heterogeneous system. The computational complexity of 
the evaluations is found to be mainly determined by the chosen distributed 
database model, while the accuracy of the results are not so much dependent 
on the models. 

Keywords and phrases: Availability, Database models, Distributed 
Systems, Distributed Database Systems, Performance Evaluation, Proba- 
bilistic Analysis, Reliability, Transaction Availability 



Measuring the Effects of Distributed Database Models 
On Transaction Availability Measures 


1 Introduction 

A distributed database system is a collection of cooperating nodes each 
containing a set of data objects l . A user transaction can enter such a 
system at any of these nodes. The receiving node, some times referred to 
as the coordinating or initiating node, undertakes the task of locating the 
nodes that contain the data objects required by a transaction. 

When we consider systems that require high guarantees for successful 
execution of transactions (especially for read-only transactions), it is impor- 
tant to consider transaction availability. Even though there are a number 
of availability (and reliability) metrics defined for computer systems [2,9], 
in this paper we choose two metrics! Start availability (TSA) and finish 
availability (TFA). 

Transaction start availability (TSA) defines the probability with which 
a transaction can successfully start its execution. By our definition, a trans- 
action is said to have a successful start when it can access all the required 2 
copies of the data objects that it needs for its execution. For simplicity, we 
consider a data copy at a node to be available for access when that node 
is up and it is accessible from the node that is currently coordinating the 
execution of the transaction. A transaction can start its execution as soon 
as all the required data object copies are available. 

Transaction finish availability (TFA) defines the probability with which a 
transaction can complete its execution, given that it has started its execution 
successfully. If execution times for transactions are negligible (as compared 
to the mean- time-to- fail of the components), then this reliability will be 
close to L However, since transactions take a finite but significant amount 
of time to execute, it is quite possible that the nodes that are involved in 
the execution of a transaction (and available at the start of execution) may 

1 In this paper, the basic unit of access in a database is referred to as a data object. 

2 The number of copies of an object that are required to be accessed by a transaction 
depends on the operation (read or write) and the replica copy control (e.g. read-one/write- 
all, majority) [3,18]. 
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fail during its execution. In this case, the transaction is said to be aborted. 
In such cases, the execution needs to be restarted. 

Formal definitions and evaluation of these two metrics (TSA and TFA) 
depend on several factors such as the fault model of the system (including the 
reliabilities of the system components), the transaction execution policy, the 
data distribution policy, the degree of data replication, the concurrency and 
commit protocols, and the characteristics of the given transaction [4,7,9]. In 
addition, TFA depends on the execution times of transactions. 

Even though it is theoretically possible to formulate equations expressing 
the two metrics in terms of the above mentioned factors, the evaluation of 
these equations is extremely cumbersome and requires unreasonably high 
computation times. The evaluation of the exact values for these measures 
generally involves both analysis and simulation. Evaluation tools with such 
large execution times are certainly not acceptable to a database designer 
who needs to evaluate a number of such possible database configurations 
before arriving at a final design. 

To overcome these problems, designers and researchers generally resort 
to approximation techniques [7,8,16]. These techniques reduce the compu- 
tation time by making simplifying assumptions regarding data distribution, 
data replication, and transaction execution. The time complexity of these 
techniques primarily depends on the underlying model as well as the evalu- 
ation technique. 

The effect of data distribution and replication models on evaluation of 
transaction response time has been measured in earlier studies [13]. These 
studies indicate that the computational complexity of a selected database 
model does not necessarily reflect the accuracy of the resulting performance 
evaluations. In fact, a model requiring computational time of 0(n ) has 
yielded results very close to those from a complex model with 0(n n ) com- 
plexity. 

In this paper, we study the effect of data distribution, data replication, 
and fault models on the accuracy of transaction availability evaluations. We 
employ probabilistic analysis to arrive at the estimates for the desired values 
for six typical models. 

The balance of this paper is outlined as follows. Section 2 formally 

3 Here, n denotes the number of nodes in a distributed system. 
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defines the problem under consideration. In Section 3, we describe a clas- 
sification scheme for data distribution and replication policies. Section 4 
illustrates the advantages of probabilistic analysis over simulation, and em- 
ploys this technique to evaluate the measures for two different models. In 
Section 5, we compare the analysis methods for six models based on com- 
putational complexity, space complexity, and the accuracy of the measures. 
Finally, in Section 6, we summarize the obtained results, and suggest a 
general approach for design and analysis of these systems. 


2 Problem Description 

In this paper, a read-only transaction is characterized by the average number 
of data objects that it reads (i.e., its read-set size). Similarly, a read-write 
transaction is characterized by the number of data objects that it reads 
(read-set size), and the number of data objects that it updates (write-set 

size). 

The problem of estimating the availability of a read-only transaction 
may be formulated as: 

Given the following parameters, estimate TSA s and TFA S for a read- 
only transaction that requires s data objects for read access. 

• n, the number of nodes in the database 4 

• iV, the index set for the nodes in the database; N = {1,2,. . . , n} 

• d, the number of data objects in the database 

• D, the index set for the data objects in the database; D = 

{ 1,2 rf} 

• GD, the global data directory that contains the location of each of 
the d data objects; the GD matrix contains d rows and ti columns, 
each of which is either a 0 or a 1; i.e., GDij = 0 or 1, Vi G 
D and Vj G N 

• the reliability of the nodes in the network. 

The problem of estimating the metrics for a read-write transaction can 
be similarly defined. 

4 Table 1 summarizes the notation used in this paper 
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Symbol Description 

a x The number of data objects accessed from the i tk group 

c The average number of copies of a data object 

ci The number of copies of a data object in the I th class 

d The number of data objects in the database 

di The number of data objects in the i th class 

g The number of data object groups 

k Number of live nodes 

n Number of nodes 

m The number of nodes in the i th class 

p The number of copy classes 

q The number of reliability classes 

r The average node reliability 

n The reliability of a node in the i th class 

s The size of the read-set 

A\,A 2 Policies representing the data grouping 
5i,i?2 Policies representing Limits on the data objects per node 
C U C 2 Policies representing the degree of replication 
D\,D 2 Policies representing the copy distribution 
E \ , E 2 Policies representing the component reliability 
D The index set for the data objects in the database 

GA Group access vector representing the number of objects accessed 

from each class or group 
GD Global data directory (or dictionary) 

N The index set for the nodes in the database 

TSA s Transaction start availability of a read-only transaction with 
read-set size s (read-one/write-all policy) 

TSA ' Transaction start availability of a read- write transaction with 

read-set size x + y and write-set size y (read-one/write-all policy) 
TSA” Transaction start availability of a transaction with 
read-set size s (read-majority/write-majority policy) 
x The size of the read-only object set 

y The size of the read-write object set 

Table 1: Notation 
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3 Model Description 

As stated in the introduction, the primary objective of this paper is to in- 
vestigate the effect of data distribution, replication, and fault models on 
availability estimations and the computational complexity of these evalua- 
tions. 

To describe a data distribution, replication, and fault model, we charac- 
terize it with five orthogonal parameters: 

A - Object grouping (or clustering) 

B - Limits on the number of data objects per node 

C - Degree of object replication (or the number of copies) 

D - Constraints on distribution of object copies 

E - Constraints on component reliability 

We now discuss each of these parameters in detail. 

Some distributed database systems allocate individual data objects [5, 
10]. We categorize this strategy as A\. In other systems, data objects 
are first partitioned into disjoint groups, and then the resulting groups are 
allocated [12,16,17], Thus, the copies of all the data objects in a given group 
are allocated to the same set of nodes. We refer to this strategy as A?. 

Some database designers place no explicit limit on the number of data 
objects that may be placed at a node [7], This strategy is named as B\. 
Others restrict the number of data objects that may be placed at a given 
node. This may be attributed to storage limitations or for security reasons 
[11]. We refer to this strategy as B?. 

For simplicity, several analysis techniques assume that each data object 
has the same number of copies (or degree of replication) in the database 
system [6,16]. Some other techniques characterize the degree of replication 
of a database by the average degree of replication of data objects in that 
database [7]. In this paper, both these categories are referred to as C\. 
Others treat the degree of replication of each data object independently. 
We refer to this as strategy Ci- 
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Some database designers and analysts assume that each data object (or 
group) copy is randomly distributed among the nodes in the distributed 
system [7]. We refer to this as Others assume some specific allocation 
schemes for data object (or group) copies [ 11 ]. Assuming complete knowl- 
edge of data copy distribution (GD) is one such assumption. Depending 
on the type of allocation, such assumptions may simplify the performance 
analysis [ 13 ]. This category is referred to as D 2 - 

Again for simplicity, some database designers and analysts assume that 
all components (nodes and links) in a distributed system have the same 
reliability factor [1], In this paper, we only consider node failures and node 
repairs 5 . We let E\ denote a policy where all nodes are assumed to have 
the same reliability characteristics, and E 2 denote a policy where nodes are 
classified based on their reliability characteristics. 

Using this classification, any known data distribution, replication, and 
reliability policies may be categorized by these five parameters. For example, 
< A2,B\,C\,D2,E\ > represents a policy where 

1. Data objects are first grouped and then allocated. 

2. There is no explicit limit placed on the number of data objects (or 
groups) allocated to any node. 

3. Each group has the same average degree of replication. 

4 The copies of a group are distributed in some systematic manner 
among the nodes in the system. 

5 . All nodes in the system have identical reliability characteristics. 

With these five parameters, we can describe thirty two basic policies. 
Several variations of these basic schemes are possible due to variations in 
systematic distributions (D 2 ), variations on the limits of data objects per 
node (B 2 ), and the types of grouping (A 2 ). Due to space limitations, in this 
paper we chose to present the results for six of these policies. Interested 
reader may refer to [ 14 ] for an analysis of other policies. 

5 That is, the underlying network structure almost always facilitates communication 
among live nodes. 
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We chose the following six policies to study the effect of the above men- 
tioned parameters on availability computations: 

Model 1: < Au B\, C\ , D\ , E\ > 

Model 2: < A 2 , B\,C\, D\, E\ > 

Model 3: < Ai, B 2 ,C\, £> 1 , E\ > 

Model 4: < Ai , i?i , C 2 > Tb » > 

Model 5: < A\, B\ y C\, D 2 , E\ > 

Model 6: < A\, D\, E 2 > 

Among these, Model 1 represents a simple system that is computation- 
ally attractive (as shown in Table 2). Model 2 reflects the effect of data 
grouping on the evaluation. Similarly, Model 3 reflects the effect of placing 
limits on number of data objects. Model 4 represents the effect of varia- 
tions in number of copies of data objects on availability evaluation. Model 
5 shows the effect of biased or non-random distributions of data objects 
on the evaluation. Finally, Model 6 reflects the effect of non-homogeneous 
environment (i.e., different node reliability characteristics) on transaction 
availability evaluation. 

In the following section, we derive closed-form expressions for the average 
transaction availabilities for Models 1 and 2. 


4 Probabilistic Computation of the Availabilities 

There are several approaches for computing the availability of a given trans- 
action in a database. These computations assume a given data distribution, 
data replication, and fault models. We now look at two such methods, 
simulation and probabilistic analysis. 

Using simulation, one can generate the data distribution matrix (GD) 
based on the data distribution and replication model. One can also generate 
the reliabilities for each of the nodes in the system 6 . Similarly, one can gen- 
erate all possible transactions (with different read-sets and write-sets) that 

8 Here, we ignore the possibility of network partitioning, and thereby ignore link relia- 
bility factor. 
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can be received at each of the nodes in the network. For each such transac- 
tion received by the system, the data distribution matrix can be searched, 
and its ability to access all the required data objects may be verified. In 
addition to generating transactions, we should also generate node failures 
and node repairs in the time domain. Thus, some transactions may not be 
successful due to the inaccessibility of one or more data objects that they re- 
quire (due to node failures). With such statistics (of successful/unsuccessful 
transactions) in hand, we can obtain the average availability of a transaction 
of a given size. This average corresponds to a single distribution matrix. The 
generation and evaluation process may have to be repeated sufficient times 
to get the required confidence in the final result. Since there are d data ob- 
jects, there are (^) possible transactions with read-set 1 size s, and there are 
n nodes where each of these may be received. Given a transGction, and the 
node where it is received, determining the state (successful/unsuccessful) of 
a transaction takes at least O(nd) computations (i.e., to scan the columns of 
the GD matrix corresponding to available nodes). If the distribution matrix 
is generated k times, then the evaluation of the desired average set size for a 
transaction of size s takes 0(fcn 2 d(^)) time. In general, k is a function of the 
number of copies, the number of data objects, the number of nodes, and the 
data distribution model, and it could be very high. Suppose d = 100, s = 10, 
and n = 10, then this method requires approximately 10 lt k computations. 
Even for reasonable values of k, this is an unreasonably high computation 
time. 

To avoid this large evaluation time, we adopt probabilistic analysis. In 
this analysis, we essentially study the given data distribution and reliability 
model and arrive at an expression for the average transaction availability 
for a given read-set (or write-set) size. With probabilistic analysis, some 
data distribution models (e.g., Models 1 and 3) may require insignificant 
amounts of computation. Some may need moderate computation times (e.g., 
Models 2 and 6), whereas others may need large computation times (e.g., 
Models 4 and 5). Regardless of the model, all these need considerably less 
computation time (with more accuracy of results) than the corresponding 
simulation methods. 

We now illustrate the probabilistic method of analysis by applying it for 

7 The corresponding term for write-sets of update transactions may be easily written. 
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Models 1 and 2. Expressions for other models may be derived in a similar 
manner. Interested reader may find the details of these derivations in [14]. 

4.1 Derivation of Reliability Metrics for Model 1 

Model 1, designated as < A x , B x , Cj , D l , E x > assumes the following about 
the data distribution and replication: 

[Rl] The data objects are allocated individually (i.e. not grouped) to the 
nodes. 

[R2] There are no limits placed on the number of data objects that may be 
placed at each node. 

[R3] The average degree of replication (c) of a data object is given. 

[R4] The copies of a data object are allocated randomly. 

[R5] Each node in the system has identical reliability (= r). 

Further, to simplify the illustration of the current analysis, we make the 
following assumptions regarding the distribution of groups, and the partici- 
pating node set determination: 

[R6] Each transaction is equally likely to access any data object. 

[RT] The transactions that enter the distributed system are coordinated 
by a set of reliable servers that search the distributed database system 
(i.e., the availability of nodes and their dictionaries) for the availability 
of the required data objects. 

Due to Rule R7, we will not distinguish transactions that are received 
at different locations in the system. Thus, we will disregard the originating 
node as a parameter in this analysis 8 . 

s The analysis can easily be extended to a situation where transactions received at an 
unavailable node are automatically considered as unsuccessful. 
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4.1.1 Derivation of Availability for Read-only Transactions 


Let us consider a read-only transaction Ti with s objects in its read-set and 
received at one of the servers. Let us also assume that the copy control 
algorithm follows a read-one/write-all policy. Thus T\ needs to access any 
one of the c copies of a data object that it requires. 

Given that exactly Jfc of the n nodes are available (i.e., up), the probability 
that at least one copy of a given data object is available is given by: 


By definition of the read-one/write-all policy, P k , 1 represents the probability 
that a data object is available for read access in the system. Since each data 
object is allocated independently to the nodes in the system (by Rules R1 
and R2), the probability that all s data objects required by 7j are available 
for read access within these k nodes can then be expressed as: 


Pk,, = Pi, i = 


1 - 


rr) 

O J 


(2) 


Assuming the reliability of any given node to be r (from Rule R5), the 
probability that T\ has successfully started is: 


TSA, 


E ?r*(l-r)-% 


*=1 


k-l 


- fc \1 


Eu/ r ( 1 - r >" 


1 rr) 
a 


( 3 ) 


Given that 7\ has successfully started, we will now compute the prob- 
ability with which it can be successfully completed. Let us assume that n, 
nodes are involved in the execution of T \ , and that it has an execution time 
of t units. Now, in order for T\ to be successful, all these n, nodes have to 
be available for at least t units of time, given that they were available at the 
start of execution. Assuming an exponential distribution for time between 
node failures with a failure rate of A, the probability that a node which is 
available at time zero is available throughout time t is given by: 


At 


e 


-t\ 


(4) 
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From here, the probability that none of the n s nodes have failed during time 
t is given by: 

TFA, = A" 5 

= e' njiA ( 5 ) 


Estimating n, for transaction T x is a complex problem. This problem has 
been well investigated and the details of the solutions may be found in [15]. 
In this paper, we assume that n , for T\ has been obtained a priori for a 
given data distribution and fault model. 


4.1.2 Derivation of Availability for Read-write Transactions 

Let us now consider a read-write transaction X 2 with s objects in its read- 
set and y objects in its write-set. Let us assume that for a given read-write 
transaction write-setC read-set [3,7], Thus, among the s data objects, y 
objects are both read and written, while x = s - y data objects are only 
read. (Note that the intersection of the read-only and the read-write sets of 
the data objects is empty.) Since the replication control algorithm follows a 
read-one/ write-all policy, T 2 needs to access all c copies of the y data objects 
and any one copy of the x data objects. 

Given that exactly k of the n nodes are available (i.e., up), the probability 
that all c copies of a given data object are available is given by: 



Since each data object is allocated independently to the nodes in the system 
(by Rules R1 and R2), the probability that all y data objects required by 
T 2 are accessible for update is expressed as: 



Similarly, the probability that all x data objects are available for read access 
may be computed as: 


Pk,r 



r;*) ' 

0 J 


( 8 ) 
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From here, the probability that T 2 is successfully started may be computed 
as: 

■ £ 0 " (I - r) 

The finish availabilities for T 2 may be similarly computed using Equa- 
tions (4) and (5) where n 3 is now replaced by n x ,y [id]. 



4.1.3 Derivation of Availability for Transactions with Majority 
Consensus 


In the above two sections, we dealt with read-one/write-all replication con- 
trol policy. The majority consensus protocols [18] which require the acces- 
sibility of at least a majority of the total copies of a data object for both 
read and write operations are very attractive in a failure prone environment. 
Since both read and write operations require the same number of copies of 
a data object, in this analysis we do not distinguish between read-only and 
update transactions. Here, we simply refer to 7\ as a transaction. 

Let m = fSiij represent the majority of copies. Then the expression for 
start availability for T\ is given as: 


TSA" a = £ 


?W-rr* 


k—m 


f (;)( n c :f) 

u L o 


( 10 ) 


Similarly, the expression for the finish availability for T\ may be expressed 


as: 


TFA 3 


A n t ‘ 

e - n ,t\ 


( 11 ) 


where n 3 now represents the average number of nodes accessed for executing 
T\ with the majority consensus protocol [15]. 
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4.2 Derivation of Transaction Availability for Model 2 

Model 2, designated as < > is similar to Model 1, except 

that the data objects are now grouped, and the groups are then allocated 
to nodes in the system. This may be described as: 

[R9] The data objects are first grouped and the groups are then allocated, 
to the nodes. Let the d data objects be partitioned into t distinct 
groups. Let d represent the number of data objects in group k. Thus, 

EU d, = d. 

[RIO] There are no limits placed on the number of groups that may be placed 
at each node. 

[R 1 1] The degree of replication is the same for each group (c). 

[R12] The copies of a group are allocated randomly. 

[R13] Each node in the system has identical reliability (r). 

Again, to simplify analysis, we make the following assumptions: 

[R14] Each transaction is equally likely to access any data object. 

[R15] The transactions that enter the distributed system are coordinated 
by a set of reliable servers that search the distributed database system 
(i.e., the availability of nodes and their dictionaries) for the availability 
of required data objects. 

4.2.1 Derivation of Availability for Read-only Transactions 

Once again let us consider transaction T\ executing under a read-one/write- 
all policy. Given that k of the n nodes are available (i.e., up), the probability 
that at least one copy of group k is available is given by: 

!_££> ( 12 ) 

o 

If the vector GA = < a l5 a 2 , . . . , a t > represents the number of data objects 
accessed by T\ from each of the t groups, then the probability that Ti is 
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successfully started may be computed as: 


TSA , = 
Pr(GA) = 

f(k) = 
GA = 


£Pr(GA)£(;)r'(l-r) n -'n 

GA 1 = 1 

00-0 


k= 1 


1 


G4 f=l 

1 1 ' V a 2 

0 

f 1 if ajt > 0 
| 0 otherwise 

< a\, fl2, . . . ,CLt >, 
t 

y ak - s and VA: 1 < k < t 0 < < djt 

Jt= l 


(T) 

O 


m 


(13) 

(14) 

(15) 

(16) 


When data objects are equally distributed among the groups (i.e., dy = d 2 - 
. .. = d t = j), then this expression may be further simplified as: 


n t 


TSA, = EE rH(l-r) 


(=i k - 1 


\n-l 
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(5 

(17) 


The expression for TFA S is the same as in Equation (5). 

4.2.2 Derivation of Availability for Read-write Update Transac- 
tions 

Let us consider transaction T 2 which requires x objects for read-only oper- 
ations and y data objects for read and write operations (s = x + y). Thus 
we need to define two GA vectors for read-only and read-write data object 
sets: 

G a! — < ^ 

t 

£ 4 = x and VJk 1 < Jfc < t 0 < a < d k 
k=i 

GA" = < a?, a?,..., a? > 

t 

y a '£ = y and Wk 1 < it < t 0 < a" < d k - a'* 
jfc=i 
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In computing T y we should recall that if a data object is write 
accessible under a given node availability conditions, it is also read accessible. 
However the reverse is not true. These two facts are made use of in deriving 
the following expression for TSA f x y \ 


TSA'„ = 


*,y 


£ £ PriGA^PriGA")^ - r) 

f’W t r 


n — l 


GA ' GA 
t 


( ) 

i - 


n 

k=i 


Pr(GA') 

Pt{GA") 

m 

f"{k) 


nr- C) 

$) (2?) ••■<$) 

0 

o 

f 1 if = 0 A > 0 
1 0 otherwise 

j 1 if a£ > 0 

I 0 otherwise 


0 

a 


(18) 


As before, when data objects are equally distributed among the groups 
(i.e. d\ = d<i = . . . = d t = j ), this expression may be simplified as: 


n t t-k i / \ / / > 

= EEE K (1 - r) "'U 

(=1 *,=i i 5 =o v v ' 
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(19) 


The finish availability TFA x<y may be computed using Equation (5) 
where n, is now replaced by n x>y which is assumed to be known a priori in 
this paper. 
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4.2.3 Derivation of Availability for Transactions with Majority 
Consensus 


As described in Section 4.1.3, under the majority consensus protocol both 
the read-set and read-write set are treated in the same way for access prob- 
ability computations. Thus, we only consider a read-only transaction with 
a read-set size of s. The expression for TSA" can now be written as: 


TSA " = £Pr(GA)£(")r '( l - r) n ~ l 

GA / = 1 ' / k=l 


GA 
C + 1 


E 

,l'=m 


( hC .-,!) 1 ' 1 * 1 


o 


m = 


( 20 ) 


( 21 ) 


where Pr(GA) and f(k) are as defined in Equations (14) and (15). 

Once again, when data objects are equally distributed among the groups 
(i.e. d\ = di = . . . = d t = f), this expression may be written as: 


TSA” = 



1 k 


( 22 ) 


5 Comparison of the Availabilities for the Six 
Models 


As mentioned in the introduction, the main objective of this paper is to 
determine the effect of data distribution, replication, and fault models on 
the estimation of transaction availability. To achieve this, we evaluate the 
desired measure using six different models. The comparison of these evalua- 
tions is based on computational time, storage requirement, and the average 
values obtained. 

Due to space limitations, we cannot present the detailed derivations for 
the average values for Models 3-6. The final expressions, however, are sum- 
marized in the appendix. 


16 


5.1 Computational Complexity 

We now analyze each of the evaluation methods (for Models 1-6) for their 
computational complexity. 

• Let us refer to Model 1. From Equations (3) and (9), it is clear 
that computation of TSA, and TSA Xi y take 0(cn 2 ) time . Simi- 
larly, from Equation (10), it is clear that the computation of TSA " 
requires 0(c 2 n 2 ) time. 

• We now derive this complexity term for Model 2. Let us first look 
at the computation of TSA,. From Equation (14), we derive that 
the computation of Pr(GA) requires O(s) time. The number of G.4s 
generated is approximately 0(s‘) where t represents the number of 
data object groups. Given a GA vector and Pr{GA), computation of 
TSA, requires 0(nct + n 2 ) arithmetic operations (from Equation (IS)). 
Thus the evaluation of TSA, requires 0(s‘ ( net + n 2 + s) ) time. Sim- 
ilarly, we can conclude that TSA' xy requires 0(x'j/'(nct + n 2 + s)) 
time (Equation (19)), and TSA", requires 0(s*(nc 2 i + n 2 + s)) time 
(Equation (20)). 

• For Model 3, the computational complexity for TSA, is 0(n 2 + n(s+c)) 
(Equation (23)). Similarly, TSA' I<y and TSA 1 ' require 0(n 2 + n(c + s)) 
and 0(n 2 + n(c 2 + s )) respectively (Equations (24) and (25)). 

• The computational complexity for Model 4 depends on the number 
of copy categories. Assuming that s < dk for k = 1,2, ...p, we can 
generate approximately s p different CA vectors. Thus the computation 
of TSA, requires 0(s p (n 2 + npc+ s)) time. Tocompute TSA', we need 
to compute the number of possible CA' and CA vectors. There are 
approximately x p CA' vectors and y p CA vectors. Thus, TSA xy 
requires 0(x p y p {npc + n 2 + s)) time. Similarly, we can conclude that 
TSA" requires 0(s p (npc 2 + n 2 + s )). 

• In Model 5, we assume that the entire data dictionary information 
is available to us. Given a GD matrix and a node status vector 5, 

9 Here, we are assuming that the evaluation of the terms (£) and p q takes 0(g) and 
0(1) time respectively. 
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computation of f{S), /'(5), and f"{S) require 0{nd) time to search 
the matrix. Given n, there are 2 n possible 5 vectors. Thus the com- 
putations of TSA, TSA', and TSA" require 0{2 n (nd + s)) time. 

• In Model 6, the number of NA vectors generated is (ni + ? l)(n 2 + 
1) . . . (n, + 1). For simlification, we approximate it as ^ + lj . Given 
a NA vector, the computation of TSA, TSA', and TSA" require 0(s + 
c + q) , 0(s + c+q) and 0(sc + c 2 + qc) time respectively. Thus the three 
metric evaluations require 0(((^ + l) 9 (.s + c + g)), 0((^ + l^-sT c + g)), 
and 0((£ + l^cs + c 2 + cq )) time respectively. 

These complexities are summarized in Table 2. From this table it may be 
observed that models 1 and 2 are computationally very attractive. The 
complexity of evaluations with models 2,4, and 6 depend on the number of 
groups, the number of copy variations, and the number of reliability vari- 
ations respectively. For systems with a large number of nodes, evaluations 
with model 5 are very expensive. 

5.2 Space Complexity 

We now discuss the space complexity for the six models: 

• Models 1 and 3 just require the values of d,c,s,r and n. Thus the 
storage requirement is 0(1) 

• Since Model 2 requires that the d, values be stored, and that the GA 
vectors be generated, it requires 0(f) storage, where t is the number 
of data groups. 

• Model 4 requires 0(p) storage to contain the p copy classes. 

• Model 5 requires 0(nd ) storage for the GD matrix. 

• Model 6 requires 0(g) storage to contain the node reliability class 
information. 

Thus, Model 5 has the largest storage requirement. These complexities are 
summarized in Table 3. 
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Model 

Computational Complexity 


Read-only 

Read- write 

Majority 

1 

0(cn 2 ) 

0(cn 2 ) 

0(c 2 n 2 ) 

2 

0(s t (nct + n 2 + s)) 

0(x t y t (nct -f n 2 + s)) 

0(s t (nc 2 t + n 2 + s)) 

3 

0(n 2 + nc + ns) 

0(n 2 + nc + ns) 

0(n 2 + nc 2 + ns) 

4 

0(s p (npc + n 2 + s)) 

0(x p y p (npc + n 2 + 5 )) 

0(s p (npc 2 + n 2 + $)) 

5 

0(2 n (nd + s)) 

0(2 n (nd -f s)) 

0(2 n (nd + s)) 

6 

0((£ + !)«(/+ c + <?)) 

0((l + lWs + c + q)) 

0((f + l) ? (cs + c 2 +cq)) 


Table 2: Computational Complexities for the Evaluation of Availabilities 


Model 

Space 

Complexity 

1 

0(1) 

2 

0(0 

3 

0(1) 

4 

0{p) 

5 

0(nd) 

6 

0(q) 


Table 3: Space Complexities for the Evaluation of Availabilities 


19 




5.3 Comparison of the Availabilities 

In order to compare the effectiveness of each of these models, we have evalu- 
ated availabilities for a wide range of parameters. Due to space limitations, 
in this paper, we only present a small subset of these results. Similarly, 
since TFA,, TFA' xy , and TFA" are found to be insensitive to variations 
in models, we are not presenting these results here. We only present the 
results for the transaction start availabilities. These results are summarized 
in Figures 1-7. 

Figures 1-3 compare the availabilities obtained from the six models. The 
following assumptions are made for models 1-6: 

1. In Model 2, we assume that the d data objects are grouped into n 
data groups each containing d/n data objects. This is similar to the 
assumptions in [13]. 

2. In Model 3, we assume that each of the n nodes in the system is 
allocated exactly the same number of data objects (equal to dcjn ). 

3. In Model 4, we assume that d/2 data objects have c copies, d/4 data 
objects have c + 1 copies, and the rest have c - 1 copies. This keeps 
the average copies the same ( i .e . , c) but brings a copy variation factor 
into consideration. 

4. In Model 5, we assume that the d data objects are allocated system- 
atically so that the copies of the i th data object are allocated, in a 
circular manner, to the nodes starting from (i © u) + 1. 

5. In Model 6, we assume that n/3 nodes have reliability r — 0.1, nj 3 
have reliability r + 0.1 and the rest have a reliability r. 10 

Figure 1 summarizes the results for read-only transactions with read-one/write- 
all policy. Figure 2 presents these results for transactions (read-only or 
read-write) with majority-read/majority-write protocol. Finally, Figure 3 
summarizes the results for read-write transactions with read-one/write-all 
policy. From these results, we make the following observations: 

10 When r = 0.95, we assume that n/3 nodes have reliability r — 0.5, n/3 have reliability 
r + 0.05 and the rest have a reliability r. 
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• For read-only transactions (with read-one/^ rite-all policv), 

(i) Evaluations with models 1 and 3 are close over the entire range 
of 5 and r. 

(ii) Evaluations with models 2 and 5 are also close over the entire 
range of s and r. This may be explained by the fact that the 
number of groups g = u = 10 for model 2 and the systematic 
distribution for model 5 implicitly results in 10 groups. However, 
they do differ in the manner in which these groups are distributed. 

(iii) For r > 0.95, evaluations with all models, excepting model 4, are 
quite close. 

(iv) Evaluations with model 4 appear to significantly deviate from 
all other models for r > 0.75. This implies that modeling of 
the degree of replication is a very important task in availability 
evaluations. 

• For transactions with majority-read/majority-write policy, 

(v) Evaluations with models 1 and 3 appear to be close. Similarly, 
evaluations with models 2 and 5 are close. In addition, evalua- 
tions with model 6 are close to evaluations with models 1 and 
3. 

(vi) For 5 > 25, the availabilities appear to be independent of the 
read-set size. This implies that computations for 5 > 25 are 
redundant. 

(vii) The evaluations with models 2 and 5 seem to differ at higher 
values of n. The evaluations with the other four models are close 
for n = 20. This is an interesting observation. 

(viii) Once again, the variations in degree of replication of individual 
data objects appears to have a dominating effect on availability 
evaluations. 

• For read- write transactions with read-one/ write-all policy, 

(ix) The availabilities for s > 5 are significant only when r > 0.99. 
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(x) Since the availabilities are generally low, the effect of the differ- 
ences in the models seem to be insignificant. At high reliabilities 
(i.e. r > 0.99), the evaluations with model 4 seem to deviate 
from the evaluations with the other models. 

We will now study the effect of the individual model parameters. 

• Models 1 and 3 are very simple, and need no further investigation. 

• Evaluations with model 2 represent the effect of data object group- 
ing on availability (Figure 4). As the number of groups is increased, 
the availability seems to be decreasing. This effect seems to dimin- 
ish for g > 25. This effect is insignificant for read-write transactions. 
Similarly, this effect seems to vanish at high node reliabilities. 

• Evaluations with model 4 represent the effect of variations in degrees 
of replication of data objects (Figure 5). The effect of these varia- 
tions seem to be insignificant on read-write transactions. The effect 
of copy variations seem to be more apparent at high node reliabilities. 
Similarly, this effect seems to be more pronounced on read-only trans- 
actions (with read-one/write-all policy) than the other two classes. 

• Model 5 represents the effect of data distribution on the availability 
evaluations. From Figure 6, it may be observed that the distribution 
effect is only evident at s > 10. In addition, the effects are more 
significant for read-only transactions than the other two classes. The 
effect is less evident at high node reliabilities. 

• Model 6 represents the effect of node reliability variations on avail- 
abilities. From Figure 7, it may be observed that the variations have 
almost no effect on availability evaluations. 

6 Conclusions 

The current investigations on measuring the effect of data distribution, repli- 
cation, and fault models on transaction availability evaluation have resulted 
in some very interesting observations. As part of this study, we chose six 


22 


models representing six different parametric assumptions that researchers 
and designers generally tend to make in their analysis. Lsing probabilis- 
tic analysis, we derived expressions for transaction availability for three 
classes of transactions: read-only (read-one/ write-all policy), transactions 
with majority- read/ majority- write policy, and read- write transactions (with 
read-one/write-all policy). The effect of the six parameters is measured by 
evaluating availabilities (for different read-set sizes). From here, we conclude 
that: 

• By choosing a proper distributed database model, the computational 
complexity of transaction availability evaluations can be significantly 
reduced. 

• For values of s < 10, all models result in almost the same transaction 
evaluation. 

• It is not necessary to evaluate transaction availabilities for values of 
s > 25. 

• Evaluations for the read-only transactions (with read-one/write-all 
policy) are more sensitive to database modeling than the other two 
classes of transactions. 

• The degree of replication of individual (or group) data objects seems 
to have a significant effect on transaction availabilities. Thus, when 
different data objects have different copies, adopting average degree 
of replication to represent ant object in a system, may not result in 
accurate availability evaluations. 

• The actual distribution of data object copies has some, if not signifi- 
cant, impact on availability evaluation. 

• In a heterogeneous environment where different nodes may have dif- 
ferent reliabilities, it is sufficient to represent each node by the average 
node reliability, without affecting the availability evaluations. 

• Data object grouping (logical or physical) does not seem to effect the 
accuracy of availability evaluations as long as the number of groups is 
not too small (e.g. When d = 1000, g > 25 is sufficient). 
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Distributed database designers and researchers can utilize these results in 
choosing appropriate parameters that would result in reduced computational 
requirements without sacrificing the resulting accuracy of the design and 
analysis of these systems. 
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Appendix 


Model 3 < Ai, B 2 , C\, D \ , E\ >: 

Here, we assume that each node has exactly the same number of data objects 
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Model 4 < ? C2 ? , £^1 >: 

Here, each data object may have its own degree of replication specified. 
For an efficient computation, we classify the data objects into p categories 
(1 < p < n) based its degree of replication. d\ denoted the number of data 
objects in the I th category where each object has c\ (1 < c/ < n) copies. 
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The expressions for TFA„ TFA' xy , and TFA" are the same as in Equa- 
tions (26) - (28). 


Model 5 < Ai, B\,Ci, D?, Ei >: 

Here, we assume that the entire data distribution is available as a dictionary, 
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(34) 


r/"! s 'h 

tsa : = 

Pr(S) = r r(3) (1 . r j-/'"(5| 

where 

S - Node status vector; Sj = 1 => Node j is up; Sj = 0 => Node j is down. 
f(S) - The number of data objects available for read with the given node 
status vector (S). This is computed by scanning the columns of the GD 
matrix corresponding to the live nodes (as given by 5). 
f(S) - The number of data objects available for update (i.e. all c copies 
of these data objects are available at the live nodes) with the given node 
status vector (S). This is also computed by scanning the columns of the GD 
matrix corresponding to the live nodes (as given by 5). 

/"(S) - The number of data objects available with a majority of copies 
among the available nodes. As before this is computed by scanning the 
columns of the GD matrix corresponding to the live nodes (as given by 5). 
f"(S) - The number of nodes available (or up) as indicated by the vector 

5. 


Model 6 < Ai,Bi,Ci,Di,Ei >: 

Here each node may have its own reliability. For computational purpose, we 
categorize the nodes based on their reliability. We assume that there are q 
(1 < q < n ) such categories. We let n, to represent the number of nodes 
with reliability r ; , and a, to represent the number of currently active (or up) 
nodes with this reliability. 
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Figure 1. Transaction Start Availabilities for Read-Only Transactions (with Read-one/Write-all policy) 












Figure le. n=10, d=1000, c=5, r-0.75 


Figure If. n=20, d=1000, c=3, r=0.75 




Figure lg. n=10, d=1000, c=3, r=0.95 



Figure 1 (Continued). Transaction Start Availabilities for Read-only Transactions (Read-one/Write-all Policy) 




















Figure 3a. n=10, d=1000, c=3, r=0.75 


Figure 3b. n=10, d=1000, c=3, r=0.90 




Figure 3c. n=10, d=1000, c=3, r=0.99 



Figure 3d. n=10, d= 10000, c=3, r=0.90 



Figure 3. Transaction Start Availabilities for Read-write Transactions (with Read-one/Write-all Policy) 



















Figure 5a. n=10, d=1000, c=3, r=0.5 



Figure 5e. n=10, d=1000, c=3, r=0.75 



Figure 5. Illustration of the Effect of Copy 




























Figure 7a. n=10, d=1000, c=3, avg. r=0.50 


Figure 7b. n=10, d=1000, c=3, avg. r=0.75 
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Figure 7. Illustration of the Effect of Reliability Variations on Availability (Model 6) 
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Abstract 

Data replications and transaction deadlocks can severely af- 
fect the performance of distributed database systems. Many 
current evaluation techniques ignore these aspects, because it 
is difficult to evaluate through analysis and time-consuming to 
evaluate through simulation. In this paper, we use a technique 
that combines simulation and analysis to closely illustrate the 
impact of deadlock and evaluate performance of replicated dis- 
tributed database with both shared and exclusive locks. 


1. Introduction. 

A distributed database system (DOS) is a collection of co- 
operating nodes each containing a set of data items. A user 
transaction can enter such a system at any of these nodes. The 
receiving node, often referred to as the coordinating node, un- 
dertakes the task of locating the nodes that contain the data 
items required by a transaction. 

In order to maintain database consistency and correctness 
in the presence of concurrent transactions, several concurrency 
control protocols have been proposed [1). Of these, the most 
commonly used are time-stamping and locking protocols. Lock- 
ing protocols have been widely used in both commercial and 
research environments. In static locking, prior to start of exe- 
cution, a transaction needs to acquire either a shared-lock (for 
read operations) or an exclusive lock (for update operations) on 
each of the relevant data items. 

Data replication is used to improve the performance of local 
transactions and the availability of databases. In replicated 
databases, one data item may have more than one copy m 
the system. Replica control algorithms are used to maintain 
the consistency among these copies. One of these is the read- 
one/write-all protocol. With this protocol an exclusive lock 
need to acquire an exclusive lock from every copy of the data 
item For a shared lock to succeed, any one copy of the data 
item has to he share locked. When transactions will, conllicti.ig 
lock requests are initiated concurrently, they could be possibly 


blocked due to a deadlock. 

There are two major ways to evaluate the performance of 
distributed systems: simulation and analysis. Simulation is a 
conceptually tractable technique, but requires large computa- 
tion time. On the other hand, analysis is computationally faster 
but may not be tractable for all problems. In [4], Shyu and Li 
proposed an elegant analysis model to evaluate the response 
time and throughput of transactions in a non- replicated DDS. 
Assuming exclusive locking (he., only write operations) they 
model the queue of lock requests at an object as an M/M/1 
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queue [3]. This results in a closed-form for the waiting time 
distribution at a node, expressed in terms of the average rates 
of arrivals of requests and the average lock-holding time. W ith 
shared lock and replications added into the picture, it is very 
difficult to have a close model for it. Because of the limita- 
tions of simulation and analysis, we develop a technique that 
combines simulation and analysis. 

This paper is organized as the follows. In Section 2, we de- 
scribe the model used in our performance evaluation. In Section 
3, we propose an evaluation technique. In Section 4, we illus- 
trate the results. Finally, Section 5 has the conclusions. 


2. Model 

Our model has the following parameters: 

• There arc n nodes. 

• There arc d data items in a DDS. 

• A data item may be located at exactly c number of nodes. 
The dc data copies are uniformly distributed across the n 
nodes. 

• Each transaction accesses k data items. 

• r is the read ratio. So among k data items to be accessed, 
rk are accessed only for read operations, and the rest 
are for read-writc operations. Due to the read-one/ write- 
all replica control policy, a transaction must procure rk 
shared locks for rk read operations and (1 -r)kc exclusive 
locks for the (1 - r)k read-write operations. 

• Each data item is equally likely to be accessed by a trans- 
action. 

• Transaction arrivals into the system is a Poisson process 
with rate A. 

• The communication delay between any two nodes is ex- 
ponentially distributed with mean t. 

• The average execution time of a transaction, once the 
locks are obtained, is 3. 

• The deadlock mechanism is invoked every r seconds. 

• After an abortion of a transaction, it takes an average of 
cj seconds for this transaction to be restaitcd. 

• is the service rate of transactions. 

• b is the lock-holding time. 

• Ac is the arrival rate at each data copy. 
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3. Performance Evaluation Technique 

Our technique consists of two stages. In the first stage, the 
iverage transaction response time and throughput are calcu- 
lated by ignoring the deadlock. This is an iterative step involv- 
ing simulation and analysis. In the second stage, the proba- 
bilities of transaction conflicts and deadlocks are computed by 
probability models. These probabilities are used, in turn, to 
compute the response time and throughput in the presence of 
deadlocks. 

Stage 1: 

Initially, we assume that there are no lock conflicts between 
transactions. Each transaction has to procure rtc shared lock 
on data copies and (1 - r)kc exclusive locks on data copies. 
When a transaction lias got all the lock grants from these data 
objects, it can go ahead with execution. 

This procedure is summarized in the following (i steps. 

1. Initialize lock-holding time(6) to be l//*. 

2. Given the total rate of transaction arrival! A), the shared 
lock ratio(r), the number of data items(d), tin* number of 
data items required by each transaction! A) and the num- 
ber of replications(c), derive the arrival rate at each data 
copy(Ac). 

3. With the arrival rate at each data copy(Ar), the average 
lock-holding time(6), and the transmission time(/) we can 
simulate the queue at a data copy to arrive wait-time(w) 
distribution. With this distribution we can calculate tin* 
response time of transactions. 

4. With the average service time of transactions! 1 //i), ami 
the transmission time, we can derive a new lock-holding 
time(6'). 

5. Set b to this new lock- holding time U . 

6. If the old and new lock-holding time are sufficient ly c lose, 
stop the iteration. Otherwise, go back to step 3. 

At the end of stage: 1 the response: Lime? without the* considera- 
tion of transaction deadlocks is obtained. 

Stage 2: 

This stage considers transaction conflicts and computes the 
deadlock probability. Here the probabilities of transaction dead- 
lock and restart are computed. These are then used to compute 
response time and throughput in the presence of deadlocks. 

Assume there are two transactions Tl and T2. Let RS, WS 
be the read and write sets of transactions respectively. 

1. Let fsi be the probability that the readset of Tl has i data 
items overlapping with the writeset of l 2, i.e. | /?5( / 1 ) O 

WS(T2)\ = i. 

2. Let /e,j be the probability that given \RS(T\ )nH r S(7 , 2)| = 
i, the writeset of Tl has j data items overlaping with 
the ivHilfirl and wrilesrl of 12, i.e. the probability that 
\\VS(T[)n(US(T2)U\VS(T'>))\ = j. 


Clearly, 


fs, 




U) 

(VH^r-D 


a) 


(•2) 


It can also be noted that /s./e., is the probability that: 
|Read-set(Tl)n\Viite-sot(T2)| = i 

A | Write- set (Tl )n(\Vriteset(T2)URead-set(T2))|=/. 

If HI', is the probability that Tl waits for T2, 


rn\, 

= pi + p2 — pi * p2 

(3) 

pi 

= i-(i-(i/2) c r 

(•1) 

,,2 

= (1-U/2D 

(5) 


where pi is the probability that Tl waits for T2 for shared locks 
in readset 

and p2 is the probability that Tl waits for T2 for exclusive locks 
in writeset. 

Probability that Tl waits for T2 is now given by 

— r) nun(*-r,fc — i) 

Pw = £ £ ( 6 ) 

.=0 ;=0 

With this probability of waiting and the formulas in [4] we can 
calculate the probability of a transaction deadlock, the prob- 
ability of a transaction restart and the probability of a trans- 
action to be blocked by other transactions. And with these 
probabilities and the time between deadlock detection(r ), we 
( *,» calculate the response time with consideration of deadlock. 
(Details are omelted here.) 


4. Results 

Using this technique, we obtained a number of interesting 
results that illustrate the effect of deadlocks and number of 
replications on database performance. These are summarized 
in Figures 1-5. We make the following observations. 

• Transaction response limes are quite sensitive to the ratio 
of shared locks (Figure 1 and 2). Here, we compare the re- 
sponse times when deadlocks are ignored (DI, computed in 
Stage 1) with those obtained when deadlocks are consid- 
ered (DC, computed in Stage 2). The effect of deadlocks 
is more predominant at higher transaction loads and with 
smaller values of r. When r = 2/3, the effect of deadlocks 
is not significant on response time. 

• If we compare Figure t and 2 with Figure 3 and 4, it can 
be observed that the increase in replications results in the 
larger response time when read ratio is smaller than 1/3. 

• Fig. 5 shows the response times with different replication 
numbers. Here we can see that with both cases when 
read ratio is 2/3 and 1/3, the response time increases as 
the number of replications increases. But with read ratio 
equals 1/3, the increasing rate is much smaller than that 
with read I at m equals 2/ 3. 

5. Conclusions 

In [4], Shy u and Li presented an elegant technique to eval- 
uate t ho performance of distributed database systems in the 
presence of deadlocks. Their technique assumed only exclusive 
locks and thus representing the worst-case effects of deadlocks. 
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Figure. 1 Comparison of response Lime with different 
read ratio when deadlock is ignored. 
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Figurc.4 Comparison of response time with different 
read ratio when deadlock is ignored. 
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Figure. 2 Comparison of response umc with different 
read ratio when deadlock is considered. 
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Figure. 3 Comparison of response time with different 
read ratio when deadlock is ignored. 


response 

time 

(see) 



Figure. 5 


replication number c ^ 
Comparison of response time with different 
read ratio with and without deadlock. 


DC: Deadlock Considered. 
Dl: Deadlock Ignored. 


In this paper, we have extended their technique to combine sirr 
ulation and analysis. And with this extended technique weallo 1 
both shared and exclusive locking and also replications in ou 
model. We evaluated the the effect of number of data items, til 
number of data items accessed by each transaction, the ratio c 
read operations on transaction response time and the number c 
replications. These results show the importance of considerin 
both shared and exclusive lock requests, the deadlock proha 
bilitics as well as the number of replications of database fo 
response lime evaluations. 
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Abstract 

Even though transaction deadlocks can severely affect the perfor- 
mance of distributed database systems, many current evaluation tech- 
niques ignore this aspect. In [4], Shyu and Li proposed an evaluation 
method which takes deadlocks into consideration. However, their tech- 
nique is limited to exclusive locking. In this paper, we extend their 
technique to allow for both shared and exclusive locking. Using this 
technique, we illustrate the impact of deadlocks, in the presence of 
shared locking, on distributed database performance. 


Index Terms: Distributed databases, exclusive locking, performance mod- 
eling, shared locking, static locking, two-phase locking. 
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1 Introduction 


A distributed database system (DDS) is a collection of cooperating nodes 
each containing a set of data objects. A user transaction can enter such a 
system at any of these nodes. The receiving node, often referred to as the 
coordinating node, undertakes the task of locating the nodes that contain 
the data objects required by a transaction. 

In order to maintain database consistency and correctness in the pres- 
ence of concurrent transactions, several concurrency control protocols have 
been proposed [1]. Of these, locking protocols have been widely used in both 
commercial and research environments. In static locking, prior to start of 
execution, a transaction needs to acquire either a shared-lock (for read op- 
erations) or an exclusive lock (for update operations) on each of the relevant 
data objects. When transactions with conflicting lock requests are initiated 
concurrently, they could be possibly blocked due to a deadlock. Deadlocks 
are known to deteriorate performance in both centralized and distributed 
database systems [4,6]. In spite of this, several performance studies have 
ignored the deadlock problem in their analyses [2,5], 

In [4], Shyu and Li proposed an elegant technique to evaluate the re- 
sponse time and throughput of transactions in a non-replicated DDS. (In the 
rest of the paper, we refer to this as the S-L technique.) Assuming exclusive 
locking (i.e., only write operations), they model the queue of lock requests 
at an object as a M/M/1 queue [3]. This results in a closed-form for the 
waiting time distribution at a node, expressed in terms of the average rates 
of arrivals of requests and the average lock-holding time. This technique 
consists of two stages. In the first stage, the average transaction response 
time and throughput are calculated by ignoring the deadlock. This is an 
iterative step that uses the known properties of the M/M/1 queue [3]. In 
the second stage, the probabilities of transaction conflicts and deadlocks are 
computed. These probabilities are used, in turn, to compute the response 
time and throughput in the presence of deadlocks. 

In general, a database transaction reads from a set of data objects (the 
read-set) and writes on to a set of data objects (the write-set). Assuming 
that all accesses are write-only (as in S-L) results in the worst-case per- 
formance (with respect to deadlocks and response time) of a DDS. In this 
paper, we propose to extend the S-L technique to consider both the the read 
and the write operations of database transactions. Using the extended S-L, 
we evaluate the effect of deadlocks on distributed database systems. 
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2 Model 


Except for the inclusion of read operations, our model is the same as in S-L. 

For the sake of completeness, we summarize the DDS model here. 

• There are N nodes and D data objects (or data granules in S-L) in 
a DDS. The D data objects are uniformly distributed across the N 
nodes. A data object may be located at exactly one node. 

• Each transaction accesses K data objects. Among these, r ■ K are 
for read-only purpose, and the rest are for read-write. (Obviously, 
0 < r < 1.) In other words, a transaction must procure r • K shared 
locks and (1 — r) • K exclusive locks. 

• Each data object is equally likely to be accessed by a transaction. 

• Transaction arrivals into the system is a Poisson process with rate A. 

• The communication delay between nodes is exponentially distributed 
with mean t. 

• The average execution time of a transaction, once the locks are ob- 
tained, is 5. 


3 Evaluation Procedure 

Since we are only proposing extensions to the S-L model, we do not intend to 
repeat the description of their procedure. Instead, we will discuss only the 
salient features of their procedure that are relevant to describe the proposed 
extensions. 

In Stage 1 of the S-L technique, an iterative procedure is used to eval- 
uate the response time and throughput of a DDS ignoring the possibility 
of deadlocks. In each iteration, the average waiting time (for exclusive lock 
requests) at each of the data objects is computed using estimates of the av- 
erage lock-holding times from the previous iteration. By definition, no two 
exclusive lock requests can have lock grants on the same object simultane- 
ously. Also, assuming that the lock-holding time is exponentially distributed 
(with mean l//z) and that the lock request arrivals form a Poisson process 
(with rate X T = A • K/D), the distribution of waiting time W, at an object 
i is expressed as (M/M/1 queueing formula [3]) 

fw,(y) = (1 _ P) ‘ Vo(y) + -Ml - P) ‘ e ^ P ^ V (!) 
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where po(-) is the impulse function and p = A r /p. Using the waiting time 
distribution, the waiting times at the K data objects are randomly gener- 
ated. These are used, in turn, to derive new estimates for the lock-holding 
times (1/p). The iterations stop when two successive computations of aver- 
age waiting time estimates are very close. 

When we consider both shared and exclusive locks, the problem of es- 
timating the waiting time distributions becomes difficult. Since two shared 
lock grants on the same object may exist simultaneously, and an exclusive 
lock may not be granted while another shared or exclusive lock is already 
granted, the queueing discipline at a node is complex. Such complex queue- 
ing disciplines are analytically intractable [3]. For this reason, we propose to 
use simulation to solve the queueing model. Given the total rate of arrival 
of lock requests A r , the shared lock ratio (r), and the average lock-holding 
time (1/p), the queue at an object may be simulated. From here, the waiting 
time distribution may be obtained in the form of a table. Once the waiting 
time distribution is obtained, the same iterative procedure as in Stage 1 
of S-L may be adopted to compute the response time when deadlocks are 
ignored. As in S-L, transaction response time is defined as the time between 
the instance the lock requests are sent and the time the last grant request 
is received by the coordinating node. 

In Stage 2, the probabilities of transaction deadlock and restart are com- 
puted. These are then used to compute response time and throughput in 
the presence of deadlocks. When we assume that transactions only make 
exclusive lock requests, the expression for the probability of conflict between 
any two transactions is given by, 


Pc = 1- 


(D-K 
\ K 


) 


( 2 ) 


However, when we consider both shared locks and exclusive locks, the prob- 
ability of conflict is reduced. In this case the probability of conflict is given 

by, 


Ft 


, ( D -/) gg, (T) ■ (£•*) • 

(?) ' (?>•© 


( 3 ) 


where K' = r -K and represents the average number of shared locks; (K-K') 
is the average number of exclusive locks per transaction. Clearly, when 
r = 0, P c = P' c \ when r = 1, P' c = 0; and in all cases, P c > P' c . 

By replacing P c with P' c , the procedure suggested in S-L may be applied 
to obtain the desired performance metrics. 
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4 Results 


Using the extended S-L technique, we obtained a number of interesting 
results that illustrate the effect of deadlocks on database performance. These 
are summarized in Figures 1-5. We have verified our results with those 
obtained in [4] for the all exclusive locks case (r = 0). We make the following 
observations. 

• As expected, the presence of shared locks has a substantial impact on 
the probability of deadlock occurrence (Fig. 1). When only 1/3 of 
the accessed data objects are updated (i.e., r = 2/3), the probability 
of deadlock is considerably small as compared to when all objects are 
updated (r = 0). 

• The observations about the deadlock probabilities are also valid for 
restart probabilities (Fig. 2). 

• Transaction response times are also quite sensitive to the ratio of 
shared locks (Fig. 3). Here, we compare the response times when 
deadlocks are ignored (computed in Stage 1) with those obtained when 
deadlocks are considered (computed in Stage 2). The effect of dead- 
locks is more predominant at higher transaction loads and with smaller 
values of r. When r = 2/3, the effect of deadlocks is not significant on 
response time. 

• The effect of deadlocks on response time is decreased with the increase 
in the number of data items (Fig. 4). Obviously, this is due to the 
decrease in probability of conflicts and hence a decrease in deadlock 
occurrence. For r = 2/3, this effect is almost insignificant. For r = 1/3 
and r — 0, deadlocks seems to have a noticeable effect on response 
time. 

• Fig. 5 summarizes the effect of the number of locks per transaction on 
response time. When K is small, the probability of deadlock is negli- 
gible, and hence its effect on response time is small. At higher values 
of K y the effect of deadlocks on response times is significant. Similarly, 
at smaller values of r, the effect of dedalocks is more apparent. 
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5 Conclusion 


In [4], Shyu and Li presented an elegant technique to evaluate the perfor- 
mance of distributed database systems in the presence of deadlocks. Their 
technique assumed only exclusive locks and thus representing the worst-case 
effects of deadlocks. In this paper, we have extended their technique to al- 
low both shared and exclusive locking. Using the extended technique, we 
evaluated the the effect of number of data objects, the number of data ob- 
jects accessed, and the ratio of read operations on transaction response time. 
These results also indicate the importance of considering both shared and 
exclusive lock requests for response time evaluations. 
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